Spoken Malay Language Influence on Automatic Transcription and Segmentation

نویسندگان

Husniza Husni

Yuhanis Yusof

Siti Sakira Kamaruddin

چکیده

The influence of Malay language into modeling a Malay speech lexicon can be potentially useful for a more accurate transcription and segmentation. The problem arises when trying to discriminate the boundaries between similar sounding phonemes for segmentation, especially in dyslexic children‘s speech when reading, which have been influenced by the surrounding phonemes (before and after) thus making it harder to distinguish. Hence, this paper explores the need to model spoken Malay into the read speech lexical model that takes into consideration contextdependent model. By modeling spoken Malay language into the lexical model, better transcription can potentially be achieved with regards to the speech data with highly phonetically similar reading errors.

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Concept segmentation and labeling for conversational speech

Spoken Language Understanding performs automatic concept labeling and segmentation of speech utterances. For this task, many approaches have been proposed based on both generative and discriminative models. While all these methods have shown remarkable accuracy on manual transcription of spoken utterances, robustness to noisy automatic transcription is still an open issue. In this paper we stud...

متن کامل

Impact of audio segmentation and segment clustering on automated transcription accuracy of large spoken archives

This paper addresses the influence of audio segmentation and segment clustering on automatic transcription accuracy for large spoken archives. The work forms part of the ongoing MALACH project, which is developing advanced techniques for supporting access to the world’s largest digital archive of video oral histories collected in many languages from over 52000 survivors and witnesses of the Hol...

متن کامل

An HMM-based system for automatic segmentation and alignment of speech

A system for automatic time-aligned phone transcription of spoken Swedish has been developed. Using a speech recording and an orthographic transcription of the words spoken in the recording the system is able to generate a phone-level segmentation without manual intervention. The system uses a technique based on Hidden Markov Models to position 85.5% of all boundary positions within 20 ms of ma...

متن کامل

The SI TEDx-UM speech database: a new Slovenian Spoken Language Resource

This paper presents a new Slovenian spoken language resource built from TEDx Talks. The speech database contains 242 talks in total duration of 54 hours. The annotation and transcription of acquired spoken material was generated automatically, applying acoustic segmentation and automatic speech recognition. The development and evaluation subset was also manually transcribed using the guidelines...

متن کامل

Sixth International Joint Conference on Natural Language Processing Proceedings of the Fourth Workshop on South and Southeast Asian Natural Language Processing

This paper deals with the fast bootstrapping of Grapheme-to-Phoneme (G2P) conversion system, which is a key module for both automatic speech recognition (ASR), and text-to-speech synthesis (TTS). The idea is to exploit language contact between a local dominant language (Malay) and a very under-resourced language (Iban spoken in Sarawak and in several parts of the Borneo Island) for which no res...

متن کامل

ذخیره در منابع من

ذخیره در منابع من قبلا به منابع من ذحیره شده

{@ msg_add @}

با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

عنوان ژورنال:

دوره شماره

صفحات -

تاریخ انتشار 2013

Spoken Malay Language Influence on Automatic Transcription and Segmentation

نویسندگان

چکیده

منابع مشابه

Concept segmentation and labeling for conversational speech

Impact of audio segmentation and segment clustering on automated transcription accuracy of large spoken archives

An HMM-based system for automatic segmentation and alignment of speech

The SI TEDx-UM speech database: a new Slovenian Spoken Language Resource

Sixth International Joint Conference on Natural Language Processing Proceedings of the Fourth Workshop on South and Southeast Asian Natural Language Processing

عنوان ژورنال:

اشتراک گذاری